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Network is a simple but powerful representation of real-world complex systems. Network commu¬ 
nity analysis has become an invaluable tool to explore and reveal the internal organization of nodes. 
However, only a few methods were directly designed for community-detection in directed networks. 
In this article, we introduce the concept of local community structure in directed networks and 
provide a generic criterion to describe a local community with two properties. We further propose 
a stochastic optimization algorithm to rapidly detect a local community, which allows for uncover¬ 
ing the directional modular characteristics in directed networks. Numerical results show that the 
proposed method can resolve detailed local communities with directional information and provide 
more structural characteristics of directed networks than previous methods. 

PACS numbers: Valid PACS appear here 


I. INTRODUCTION 

Networks consisting of nodes connected in pair by 
edges reveal essential features of the structure, function 
and dynamics of many complex systems. Thus, complex 
networks have become invaluable tools in various fields 
including sociology, biology and physics 00. The char¬ 
acteristic of community structure in networks can aid in 
exploring the structure and organization of networks. In 
the past decade, it has attracted huge attentions. Many 
methods for resolving community structure in undirected 
networks have been developed (see Ref. [3] for a recent 
comprehensive review). However, only a limited num¬ 
ber of methods were designed for detecting community 
structure in directed networks and the direction of links 
leads to new challenges in defining community structure 
of directed networks 00. 

Directed networks show fundamentally different fea¬ 
tures when the direction of their links are ignored. The 
link direction characterizing important topological infor¬ 
mation is essential to describe the structure of many com¬ 
plex systems. The effects of link directions to the organi¬ 
zation and dynamics of complex networks have attracted 
great interests recently. For instances, link direction has 
been proven to play profound effects on link tendency 
between nodes pj. The studies on community detec¬ 
tion in directed networks have shown that considering 
link direction can shed light on key structural features of 
community structure in directed networks 00. 

How to describe the community structure in a di¬ 
rected network is an open issue. Newman and Leicht 
0 and Guimera et al. 0 have defined a community 
that nodes are assigned to it when they are linked to 
similar neighbors. This definition of a community is fun¬ 
damentally different from the general one used for undi- 
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rected networks 3]. Moreover, Rosvall and Bergstrom 
P have adopted an information theory-based method 
which shows distinct characteristics with adapted mod¬ 
ularity maximization method. Leicht and Newman [0 
and Kim et al. 0 have attempted to employ the gen¬ 
eralized form of modularity to identify the community 
structure in directed networks, respectively. However, 
similar to modularity partition methods for undirected 
networks, such type of methods which force every node 
into a community can distort the real structure of the 
networks, in which, some nodes may only loosely con¬ 
nected to any community. Moreover, the modularity in¬ 
dex pm has been shown to fail to find the most nat¬ 
ural community structure in undirected networks due to 
the resolution limit issue mm , which would be shared 
with the adapted modularity for directed networks. 

More recently, the concept of local community was pro¬ 
posed for undirected networks P 16]. The key idea is 
that, in a large network, a community, focusing on the 
“local” links within and connecting to it, refers to a lim¬ 
ited number of nodes in the whole network. The principle 
of determining such a local community at a time is dif¬ 
ferent from the partitioning methods, which consider the 
whole connections of a network. There has been no much 
work in the literature focusing on the local community 
detection even for undirected networks. Researchers have 
explored a community around a given node which relies 
on the predefined knowledge fl7l i?18]. Zhao et al. 15] 
proposed a community extraction framework considering 
only one community at a time by maximizing an extrac¬ 
tion criterion via tabu search technique. The promising 
idea and the issue of resolution limit of the proposed cri¬ 
terion have inspired a neurodynamic framework with a 
generic criterion to resolve local communities in undi¬ 
rected networks, recently [16]. Taking into account the 
complexity of directionality and intricate connections be¬ 
tween nodes, we adopt the “local” strategy to disassemble 
and study the directed networks here. 

In this article, we introduce a generic quantitative 
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FIG. 1: Illustration of three methods for discovering community structure in a directed network. The network consists of 50 
nodes, and the first 20 nodes belong to a dense subnet where links between members form independently with probability 0.7. 
The links between members and the other 30 nodes and links between the other 30 nodes all form independently with probability 
0.1. We assign directions to the links within the first 10 nodes, the second 10 nodes, and other 30 nodes randomly. While for 
links between the first 10 nodes and other 40 nodes, all are assigned directions from the first 10 pointing to other 40. As to the 
second 10 nodes, all the links are assigned directions from other 40 pointing to them. A partition into three communities using 
the directed modularity maximization (DMM) by Leicht and Newman, the undirected community extraction (UCE) without 
considering the directionality of the network and our directed community extraction (DCE) method are shown in (b), (c) and 
(d), respectively. Different colors represent the communities detected by each method. The two circled regions in (d) represent 
the two true communities respectively. If we only consider the 20 nodes of the network by removing the 30 backgroup nodes, as 
stated by Leicht and Newman, DMM can identify the two communities by a partition with two communities (a). However, in 
the current network, DMM has to balance tightness of the three communities, and as a result distort the community structure 
(b). UCE can well extract the 20 nodes as a dense community, but fail to detect the directed communities (c). Our DCE 
method, on the other hand, separates out the true community perfectly (d). 


criterion to describe a “local” community in directed 
networks (Figure 1). The generic criterion consid¬ 
ers two properties: (1) high density-the sets of nodes 
in a community are densely connected; (2) consistent 
directionality-the direction of links between a commu¬ 
nity and the rest of networks should be as consistent as 
possible. We can see that finding sets of nodes that op¬ 
timize this measure is in general a computationally chal¬ 
lenging problem. We adopt a Markov chain Monte Carlo 
(MCMC) approach to sample from sets of nodes accord¬ 
ing to a distribution. This distribution gives significantly 
higher probability to sets of nodes with high density and 
consistent directionality. MCMC is a well-established 
technique to sample from combinatorial spaces with ap¬ 
plications in various fields 0 0 including bioinfor¬ 
matics [ 2 J, [ 22 ( 1 . In general, the computation time (e.g., 
number of iterations) required for an MCMC approach 
is unknown. In our case, we empirically show that our 
MCMC-type algorithm converges rapidly to the station¬ 
ary distribution and it can scale well with respect to net¬ 
works with 10000 nodes. Numerical results show that 
our local community extraction method can resolve lo¬ 
cal communities with directional information and provide 
more structural characteristics of directed networks than 
previous methods. 


II. METHODS 

Local community extraction problem in di¬ 
rected networks We first introduce the local commu¬ 
nity extraction problem in undirected networks. Let 
G?(V, E) denote an undirected network of N nodes. The 
network is denoted by a symmetric adjacency matrix 
A = [Aij] of size N x TV, where A{j > 0 if there is an 
edge between nodes i and j and A^ = 0 otherwise. The 
positive A^ ’s are the weights for weighted networks; or 
they are set to 1 for unweighted networks. The kernel 
idea of local community extraction problem is to look for 
a set S of nodes with a large number of links within itself 
and a small number of links to the rest of the network. 
This problem can be described to optimize a quantita¬ 
tive function. Note that the links within the complement 
S c of this set do not affect the value of this function. 
Recently, we have introduced a generic quantitative cri¬ 
terion Ws to describe local communities in undirected 
network |16j which adapts the one proposed in 0 with 
a parameter p. We note that the generalized criterion can 
reveal multi-resolution community structure and conquer 
the resolution limit issue of the previous one. Specifically, 
it can be defined as follows, 

Os Bs 
JS|2 \S\\SP\ 


w s = \s\\s p \ 


( 1 ) 
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where \S P \ = pN-\S\, 2^ < p < 1, and O s = J2 

i,jes 

Bs = Y Aij. The \S P \ can be considered as the es- 

ies,jes c 

timation of the number of nodes connecting to the com¬ 
munity S in the rest of the network. When p = 1, it is the 
one proposed in jl5]. The term Os is twice the weight 
of the edges within S, and Bs denotes connections be¬ 
tween S and the rest of the network. The maximization 
of Ws can be solved efficiently by a powerful neurody¬ 
namic framework. 

Now we consider a “community” in a directed network 
G(V,E). The network can be represented by an asym¬ 
metric adjacency matrix A = [A^], where A^ > 0 if 
there is an edge directed from node i to node j, A^ = 0 
otherwise. The key point is that the community struc¬ 
ture should reflect the “directionality” in the directed 
network. The above criterion have well considered the 
density of a community and sparse connections to the 
rest of the network. Here we incorporate a parametric 
coefficient to capture the potential effect of directions, 


W| = |5||5"| 


Os n B S 

|s'! 2 Qd |s||s*>| 


( 2 ) 


where q d = j and Os = J2 A ij> B s = B* s n + 
1 s s 1 ijes 

B° s u \ B° s ut = ]T Aij, Bf = Y A ij and 1 < 
ies,jes c ies c ,jes 

Qd < Bs + 1. If the directions of all links between S 
and S c are consistent, i.e., Bg 1 = 0 or Bg ut = 0, then 
qd is equal to 1 which has no effect to the criterion Wg. 
While if the directions are inconsistent, the qd gets larger 
than 1 that penalizes the second term. Note that n is a 
parameter to control the degree of penalty. 

We note that the problem to maximize the generic cri¬ 
terion Wg is computationally difficult, and it is likely 
that there is no efficient algorithm to solve this prob¬ 
lem exactly. Although the neurodynamic framework has 
shown well performance, it is not directly applicable to 
the current problem due to the effect of the multiplier q% . 
We consider to develop a stochastic search procedure to 
solve this problem. 

A MCMC approach We introduce a MCMC ap¬ 
proach to solve the problem described above. The 
MCMC approach samples sets of nodes, with the prob¬ 
ability of sampling a node set S' proportional to the ob¬ 
jective weight Wg of the set. Thus, the frequencies that 
node sets are sampled in the MCMC method provides a 
ranking of node sets, in which the sets are ordered by 
decreasing sampling frequency. Thus, in addition to the 
highest objective weight set, one may also examine other 
sets with high objective value (“suboptimal” sets) that 
are nevertheless significant. 

The basic idea of the MCMC is to build a Markov chain 
whose states are the subsets of nodes of the network and 
to define transitions between the states that differ by 
one node. The Metropolis-Hastings algorithm provides 
a general method for designing transition probabilities 
that gives a desired stationary distribution on the state 


space. However, the Metropolis-Hastings method does 
not guarantee fast convergence of the chain, which is a 
necessary condition for practical use of this method. In 
fact, if the chain converges slowly, it may take an im- 
practically long time before the chain samples from the 
desired distribution. Defining transition probabilities so 
that the chain converges rapidly to the stationary distri¬ 
bution remains a challenging and important task in real 
applications. Despite significant progress in recent years 
in developing mathematical tools for analyzing the con¬ 
vergence time [20], our ability to analyze useful chains 
is still limited, and in practice, most MCMC algorithms 
rely on simulations to provide evidence of convergence to 
stationarity 0- 

We devise a Metropolis-Hastings algorithm to sample 
sets Sg of nodes with a stationary distribution that 
is proportional to e cW ^ for some c > 0. At time 
£, the Markov chain in state St chooses a node u in 
the neighborhood of St, and moves to the new state 
St+i = St \ or St +i = St UM with a certain 
probability. In general, there are no guarantees on the 
rate of convergence of the Metropolis-Hasting algorithm 
to the stationary distribution. However, we empirically 
demonstrate that in our case the MCMC rapidly con¬ 
verges, and thus the stationary distribution of a “local” 
subnet is reached in a practical number of steps by our 
method. 

Algorithmic procedure 

Initialization: Choose an arbitrary small subset So of 
nodes in G (the set of all nodes). 

Iteration: For t = 1,2,..., obtain S t + i from S t as 
follows: 

1 Choose a node u uniformly at random from S£ (it is 
the closure of St, i.e., S t |J{all neighbor nodes of St}). 

2 If u G S t , let P(S u u ) =min[l,e cW ( 5 *\ti)-cW(5t)] ; With 
probability P(St,u ) set St+% = St\u , else St+i = St . 

3 If u G S$\S t , let P(S u u ) =min[l,e cW ^UM)-cW(5 t ) ]; 

With probability P(S t ,u) set S t + 1 = St e ^ se 

St+x = S t . 


The MCMC method is very promising and efficient due 
to the speed of convergence of the Markov chain to its 
“local” stationary distributions. We have shown that our 
method can scale well with large-scale network of 10000 
nodes. 

Stop criterion After determining a local community, 
our method can be further applied to its complement in 
the network to extract the next community. How to de¬ 
termine the number of local communities in a network is 
a hard, but practically important problem. In real ap¬ 
plications, we would suggest to evaluate the statistical 
significance of a community by comparing its objective 
value with that of 100 random directed networks gener¬ 
ated by reserving the same set of nodes and the same 
number of edges |23j . 
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FIG. 2: Tests of local directed modularity optimization on the benchmark via bar plots of adjusted Jaccard similarity coefficient 
measure. The number of nodes N = 500 for (A) and N = 1000 for (B) respectively. Two communities of three different sizes 
were embedded with two different parameter settings p± and p 2 . Different parameters p and n were tested and shown. Each 
bar plot corresponds to an average over 20 network realizations. 


III. RESULTS 


Numerical tests We first test the directed commu¬ 
nity extraction (DCE) criterion Wg maximized by the 
MCMC algorithm and further compare it to the undi¬ 
rected community extraction (UCE) criterion Ws ignor¬ 
ing the link direction 0, and the generalized directed 
modularity maximization (DMM) method proposed by 
Leicht and Newman [|] on simulated directed networks. 
To compare grouping results against the independent 
partitions defined by the embedding communities, the 
adjusted Jaccard similarity coefficient as a measure of 
agreement is used for assessments. The Jaccard similar¬ 
ity coefficient is defined as the size of the intersection 
divided by the size of the union of the two sets: 


J(A,B) 


mm 

\A\JB\- 


We simulate a directed network of ni 2 +no nodes start¬ 


ing with a set Si 2 of 77,12 densely connected nodes and 
weakly connected background So of no nodes. Each pair 
of nodes in S 12 and So are connected by links indepen¬ 
dently and uniformly at random with probability p\ and 
P 2 - The direction of links within S 12 and So are assigned 
at random but for links that fall between a subset S 2 of 
n 2 nodes in S 12 and others are assigned directions from 
S 2 to others. While for links between Si(Si 2 \S 2 ) and 
others are assigned direction at random from others to 
Si (Si 2 \S 2 ) (see Figure 1 for an example). 

Given a result of two communities C* (i = 1,2) gen¬ 
erated by a method, we adopted the following definition 
of adjusted Jaccard similarity coefficient to measure the 
accuracy in our simulation study: 


t/(S, G) max^j e {i ? 2},i/j ^ 


/Si nci 
UiUCi 


met 

S2\JCj)- 


It is the degree-normalized maximum of all the possible 
sums of Jaccard similarity coefficient of two groups of 
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p^O.6, p 2 =0.05 p^O.6, p 2 =0.1 

1 


FIG. 3: Comparison of three methods via bar plots of adjusted 
Jaccard similarity coefficient. The three methods are the di¬ 
rected modularity maximization (DMM), the local undirected 
community extraction (UCE) method and our method (DCE) 
in this paper respectively. The number of nodes N = 500 for 
top two subfigures and N = 1000 for bottom subfigures. Two 
communities with sizes of Si = 40 and S 2 — 50 are embed¬ 
ded in the simulated networks. Two types of connections are 
tested based on p± and P 2 . Each bar plot corresponds to 20 
network realizations. 



DMM UCE DCE 



DMM UCE DCE 


local communities. When the measure is equal to 1, it 
implies that the two true groupings are perfectly identi¬ 
fied by the tested method. 

We first apply our method onto various networks with 
different connection characteristics (determined by pa¬ 
rameters pi and P 2 ) and test the effect of different cri¬ 
terion parameters (i.e., p and n). We extract the first 
two communities by our method for the calculation of 
the adjusted Jaccard similarity coefficient. The results 
clearly depend on parameters p\ and P 2 of the bench¬ 
mark, and parameters p and n of the proposed directed 
local modularity criterion W$ (Figure 2). 

We can see that the results are more accurate with 
n = 5 than those with n — 1, indicating that the pa¬ 
rameter to control the degree of penalty is helpful. We 
can also see that the results with p = 0.6 are better than 
those with p = 1, suggesting that the original quantita¬ 
tive function use the number of all complementary nodes 
of a community is problematic in some cases. In the fol¬ 
lowing, we will choose p = 0.8 and n = 5 for further 
comparative analysis. 

We further compare our method with the other two 
methods. For a fair comparison, we extract two commu¬ 
nities by our method and the undirected local community 
extraction method respectively, and we partition the net¬ 



FIG. 4: Testing the running time of our method for one com¬ 
munity with respect to the network size from N — 2000 to 
10000. Each bar corresponds to an average over 20 network 
realizations. 


work into three parts by the directed modularity maxi¬ 
mization method to allow one for background nodes (Fig¬ 
ure 3). We can clearly see that our method performs the 
best for all four settings. While undirected community 
extraction usually merge the two directed communities as 
one community and extract another “dense” subset as its 
second community. Directed modularity maximization 
improves slightly for denser communities, but it tends 
to add the background nodes to a community, result¬ 
ing in poor overall adjusted Jaccard similarity coefficient. 
For large-scale networks, this situation even gets worse 
due to the resolution limit of modularity-type of meth¬ 
ods. Actually, even for small-scale networks with only 
50 nodes, we can see that the directed modularity maxi¬ 
mization can not identify the embedded communities well 
(Figure lb). This is partially because the connectivity 
within the background, and between it and real commu¬ 
nities can affect the (directed) modularity. If we remove 
the background nodes and links, the directed modular¬ 
ity method can identify the two communities (Figure la). 
All these results show that “local” extraction strategy re¬ 
duces the effect of background nodes, and improves the 
performance of “partition” type of community detection 
methods. 

The computational efficiency of the proposed method 
can also be seen in the simulation study, where we have 
applied our method onto networks with 10000 nodes. The 
experimental analysis have shown that our method can 
scale well (Figure 4). 

Real applications We further apply our method onto 
a directed sporting competition network of US universi¬ 
ties in the American football game during the 2005 season 
which was firstly used by Leicht and Newman recently 
(Figure 5) [5]. The nodes represent the teams in the 
‘Big Ten’ regional competitions or ‘conference’, and the 
edges link pairs of teams that played one another. The 
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FIG. 5: Community detection in the small football network. Two communities separated by the vertical dashed line are identified 
by the undirected modularity maximization (a). And the box covered nodes represented the first community extracted by the 
local directed extraction method (b). The box covered region in (b) represent the community, in which all teams lost a majority 
of their games. 


direction of each edge reflects the win or lose relation¬ 
ship between the two competing teams, i.e., the edges 
pointing from the winner to the loser of each game. The 
traditional representation is undirected which may miss 
important information. Our method and directed modu¬ 
larity maximization method can precisely extract a com¬ 
munity including four teams, in which all of them lost a 
majority of their games. While the undirected commu¬ 
nity extraction method (UCE) and undirected modular¬ 
ity maximization fail to identify it. They only extract a 
community with five teams randomly due to the symmet¬ 
ric connectivity property of all nodes. This small network 
clearly demonstrates that the edge directions play vital 
roles in forming the community structure of a network. 

The small football network represents a regional con¬ 
ference (‘Big ten’ conference) which likely corresponds 
to a community in the undirected network of the whole 
country. We next apply our method onto another di¬ 
rected football network of the whole country to show its 
advantages with p = 1 and n = 8 (Figure 6). We should 
note that its corresponding undirected version has been 
comprehensively used as a gold testing system for eval¬ 
uating the community-detection methods in undirected 
networks. The football network originally compiled by 
Girvan and Newman [25[ contains the competition rela¬ 
tionships of American football games between Division 
IA colleges during regular season Fall 2000. The node 
and edge of this network represent every team and every 
game played between two teams respectively. Meanwhile, 
the nodes were marked with colors indicating the confer¬ 
ences to which they belong. Note that the assignments to 
conferences, the node colors, were corrected recently [26| . 
Here, we label the win and loss relationship between two 
competing teams in this football network and construct 
a directed football network to test our method. 

Our method has shown very different community struc¬ 
ture with the original conferences (or computationally 
community-detection in its undirected version). We also 


have applied DMM to this network which has identified 
the similar community structure with the DMM on undi¬ 
rected version as previous tested. The DMM fails to cap¬ 
ture the directional information. While the DCE method 
discovers distinct community characteristics (Figure 6). 
For example, the community 1 consisting of 8 teams, each 
of which won most of their games with respect to all other 
teams. We may consider it be a strong group. While 
community 3 failed most of their games, we may see it as 
a weak group. This community structural organization 
format has revealed different properties compared to the 
original conference organization. This exploration pro¬ 
vide more insights into the topological organization and 
enhance our understanding to the underlying principle of 
this network. 


IV. CONCLUSION 

How to describe community structure of directed net¬ 
works is an open issue in network science. It has attracted 
many people with broad range of interests of diverse fields 
including physics, sociology, biology and so on. In this 
article, we investigate the community structure problem 
in directed networks from a “local” view. We propose a 
new framework for recovering the local community struc¬ 
ture in directed networks by optimizing a generic crite¬ 
rion via MCMC stochastic search techniques. We further 
apply it to both simulated and real networks to demon¬ 
strate that it is able to recover known local community 
structure and reveal unexpected local patterns which can 
not be recovered when ignoring the direction information. 
The main purpose of this article is to propose the new 
concept and theoretical framework to analyze the com¬ 
munity structure of directed networks which shed lights 
on the network’s organization and dynamics. 
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FIG. 6: Community identification in the directed football network by our method. Colors represent the original 11 conferences 
and 8 independence teams (soft red). The identified local communities were grouped in circles and the corresponding number 
in the shaded box represent the rank of their scores. We can see that the extracted region represent the community, in which 
all teams lost or won a majority of their games against all others. 
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